MapReduce Functions on GasDay Data Using Hadoop
نویسنده
چکیده
The GasDay lab at Marquette University forecasts natural gas consumption for 26 Local Distributing Companies around the United States. We have a very large amount of data that has accumulated over the past 19 years, and the lab needs a way to select and process from all of this data to gain insight into our forecasting methods. MapReduce is a pair of functions originally proposed by Jeffrey Dean and Sanjay Ghemawat of Google that allows users to process very large data sets. Hadoop is a software framework created by Doug Cutting in 2004 that allows scientists to distribute vast amounts of data across a cluster of many Linux machines and run MapReduce over the entire data set quickly. The quick turn-around facilitates processing of “Big Data.” As examples of MapReduce processing, we found the most accurate type of model, most accurate forecast number, forecast accuracy over time, and the total amount of gas forecasted each year. This is important because we are able to explore different High Performance Computing solutions for anyone with “Big Data.” This allows GasDay to improve our forecasting models and grow continuously as a business. ∗This work was supported by the National Science Foundation under grant CNS-1063041.
منابع مشابه
Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملSurvey of Parallel Data Processing in Context with MapReduce
MapReduce is a parallel programming model and an associated implementation introduced by Google. In the programming model, a user specifies the computation by two functions, Map and Reduce. The underlying MapReduce library automatically parallelizes the computation, and handles complicated issues like data distribution, load balancing and fault tolerance. The original MapReduce implementation b...
متن کاملBenchmarking and Performance studies of MapReduce / Hadoop Framework on Blue Waters Supercomputer
MapReduce is an emerging and widely used programming model for large-scale data parallel applications that require to process large amount of raw data. There are several implementations of MapReduce framework, among which Apache Hadoop is the most commonly used and open source implementaion. These frameworks are rarely deployed on supercomputers as massive as Blue Waters. We want to evaluate ho...
متن کاملA Common Compiler Framework for Big Data Languages: Motivation, Opportunities, and Benefits
We are in the era of Big Data and cluster computing. Data sizes have been growing at an exponential rate. At the same time, growth in computing power has been stagnating due to physical limits in processor technology. The only cost effective way to keep up with the growing data trend has been to harness multiple commodity computers in a shared-nothing configuration. Google, needing to manage ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012